n Average - ent Learning
نویسنده
چکیده
Average-reward reinforcement learning (ARL) is an undiscounted optimality framework that is generally applicable to a broad range of control tasks. ARL computes gain-optimal control policies that maximize the expected payoff per step. However, gainoptimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish between different policies that all reach an absorbing goal state, but incur varying costs. A more selective criterion is bias optima&y, which can filter gain-optimal policies to select those that reach absorbing goals with the minimum cost. While several ARL algorithms for computing gain-optimal policies have been proposed, none of these algorithms can guarantee bias optimality, since this requires solving at least two nested optimality equations. In this paper, we describe a novel model-based ARL algorithm for computing bias-optimal policies. We test the proposed algorithm using an admission control queuing system, and show that it is able to utilize the queue much more efficiently than a gain-optimal method by learning bias-optimal policies.
منابع مشابه
Retrospective analysis of use and distribution of resources in otolaryngology wards in Romanian hospitals between 2003 and 2008 to improve provision and financial performance of healthcare services.
AIM To analyze use and distribution of resources by otolaryngology (ENT) hospital wards in Romania between 2003 and 2008, in order to plan the improvement of patient access to health care services and health care services' financial performance. METHODS Clinical electronic records were searched for all patients discharged from all public hospitals funded on a per-case basis by the government ...
متن کاملccrABEnt serine recombinase genes are widely distributed in the Enterococcus faecium and Enterococcus casseliflavus species groups and are expressed in E. faecium
The presence, distribution and expression of cassette chromosome recombinase (ccr) genes, which are homologous to the staphylococcal ccrAB genes and are designated ccrAB(Ent) genes, were examined in enterococcal isolates (n=421) representing 13 different species. A total of 118 (28 %) isolates were positive for ccrAB(Ent) genes by PCR, and a number of these were confirmed by Southern hybridizat...
متن کاملMutual Information and Bayes Methods for Learning a Distribution
Each parameter w in an abstract parameter space W is associated with a di er ent probability distribution on a set Y A parameter w is chosen at random from W according to some a priori distribution on W and n conditionally indepen dent random variables Y n Y Yn are observed with common distribution determined by w Viewing W as a random variable we obtain bounds on the mutual information between...
متن کاملRole of the Tectorial Membrane Revealed by Otoacoustic Emissions Recorded From Wild-Type and Transgenic Tecta ENT/ ENT Mice
Lukashkin, Andrei N., Victoria A. Lukashkina, P. Kevin Legan, Guy P. Richardson, and Ian J. Russell. Role of the tectorial membrane revealed by otoacoustic emissions recorded from wild-type and transgenic Tecta ENT/ ENT mice. J Neurophysiol 91: 163–171, 2004. First published October 1, 2003; 10.1152/jn.00680.2003. Distortion product otoacoustic emissions (DPOAE) were recorded from wildtype mice...
متن کاملAddressing Limited Data for Textual Entailment Across Domains
We seek to address the lack of labeled data (and high cost of annotation) for textual entailment in some domains. To that end, we first create (for experimental purposes) an entailment dataset for the clinical domain, and a highly competitive supervised entailment system, ENT, that is effective (out of the box) on two domains. We then explore self-training and active learning strategies to addr...
متن کامل